Linear scaling computation of the Fock matrix VII. 
Periodic Density Functional Theory at the r-point. 
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Linear scaling quantum chemical methods for Density Functional Theory are extended to the 
condensed phase at the F-point. For the two-electron Coulomb matrix, this is achieved with a 
tree-code algorithm for fast Coulomb summation [J. Chem. Phys. 106, 5526 (1997)], together 
with multipole representation of the crystal field [J. Chem. Phys. 107, 10131 (1997)]. A periodic 
version of the hierarchical cubature algorithm [J. Chem. Phys. 113, 10037 (2000)], which builds a 
telescoping adaptive grid for numerical integration of the exchange-correlation matrix, is shown to 
be efficient when the problem is posed as integration over the unit cell. Commonalities between 
the Coulomb and exchange-correlation algorithms are discussed, with an emphasis on achieving 
linear scaling through the use of modern data structures. With these developments, convergence 
of the r-point supercell approximation to the k-space integration limit is demonstrated for MgO 
and NaCl. Linear scaling construction of the Fockian and control of error is demonstrated for 
RBLYP/6-21G* diamond up to 512 atoms. 
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I. INTRODUCTION 

Quantum chemical methods that employ Gaussian- 
Type Atomic Orbitals (GTAOs) offer a number of advan- 
tages in materials science. First, because they are local 
basis functions, it is possible to achieve a linear scaling 
cost with system size for insulating systems. Secondly, al- 
most all one- and two-electron integrals involving GTAOs 
are analytic, enabling the rapid evaluation of expecta- 
tion values involving complicated operators that are often 
involved in the computation of response properties'^. 
The DALTON quantum chemical program^ is a premier 
example of this capability, offering a wide range of electric 
and magnetic molecular response properties. The ability 
to treat core-states analytically also opens the ability to 
go beyond the pseudopotential approximation in com- 
putation of relativistic effects with the four-component 
Dirac-Hartree-Fock££ and Dirac-Kolm-Sharn 7 = theories. 
Perhaps most important though, the exact Hartree-Fock 
(HF) exchange may be computed efficiently with a GTAO 
basis set. In addition to providing a reference for corre- 
lated wavefunction methods, the exact HF exchange is 
central to hybrid HF/DFT modekftW 1 . The use of 
hybrid methods in the condensed phase, pioneered by the 
CRYSTAL groupie, has proven to be a useful improve- 
ment beyond the generalized gradient approximation for 
a number of properties, including bulk geometries, elec- 
tronic propertiesi&i^ and absorption energie a 15 ' 16 . 

Recently, we have developed linear scaling quantum 
chemical methods for gas phase Density Functional The- 
ory (DFT), including computation of the Coulomb ma- 
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trix Jil and the exchange-correlation matrix K x <^&. In 
this contribution, these linear scaling methods are ex- 
tended to periodic boundary conditions at the r-point. 

With periodic linear scaling quantum chemical algo- 
rithms, it is possible to begin bridging the gap between 
methods developed for small molecule chemistry and 
large scale problems in the solid state. Together with 
the results presented here, O(N) methods for solving the 
Self-Consistent-Field equations 1 ^^ and linear scaling al- 
gorithms for computing the periodic HF exchange 2 ^, it is 
now possible to perform condensed phase HF/DFT cal- 
culations on systems larger than 500 atoms with a single 
processor. In addition, with the advent of linear scaling 
density matrix perturbation theorjiS 2 ^, well developed 
quantum chemical methods for the analytic computation 
of response properties may be brought to bear on large 
solid state problems. 

This paper is organized as follows: In Section [HI 
periodic boundary conditions and the r-point approx- 
imation are introduced. Next, in Section IIIII the re- 
lationship between the numerical error estimates and 
data structures that underly the fast linear scaling al- 
gorithms for computation of the Coulomb and exchange- 
correlation matrix are outlined. In Section IIVI we ex- 
tend previous work on the Niboer and De Wett o 24 ' 25 lat- 
tice sum method to linear scaling computation of quan- 
tum Coulomb sums and tin-foil boundaries. Then, in 
Section El 0(N) methods for computing the GTAO- 
based exchange-correlation matrix are presented. In Sec- 
tion IVI Al we discuss the implementation of these de- 
velopments in the MondoSCF— suite of linear scal- 
ing quantum chemistry codes. In Section IVI Bl com- 
parison of the r-point results is made with those ob- 
tained with CRYSTAL98 using k-space integration for 
NaCl and MgO. Next, in Section IYl CI linear scaling is 
demonstrated for construction of the diamond Fock ma- 
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trix at the RBLYP/6-21G* level of theory. Finally, in 
Section IV ill we present our conclusions. 

II. PERIODIC BOUNDARY CONDITIONS, 
LINEAR SCALING AND BASIS SETS 

In the conventional implementations of periodic 
boundary conditions, the Bloch functions 

V, a k (r)=]>>^ a (r-R), (1) 

R 

are often constructed from non-orthogonal functions lo- 
cal to the unit cell (UC). Here, the local function <f> a 
is a Gaussian- Type Atomic Orbital (GTAO) centered on 
atom A, while the sum on R runs over the Bravais lattice 
defined by integer translates of the primitive lattice vec- 
tors a, b and c. These Bloch functions (crystal orbitals) 
yield all possible translational symmetries through varia- 
tion of the reciprocal lattice vector k. Programs such as 
CRYSTAL98 perform a careful sampling of reciprocal 
space to achieve an accurate description of the periodic 
system. An alternative approach to including these im- 
portant symmetries is to set k = 0, and then use a larger 
supercell created through replication and translation of 
the primitive unit cell. This is the supercell T-point ap- 
proximation, used primarily for the study of defects and 
vacancies rather than as a replacement for k-space inte- 
gration. 

In this contribution, O(N) algorithms are developed 
specifically for the the T-point approximation, allowing 
the use of large supercells in the case of high symmetry, 
as well as large primary cells in the case of disordered sys- 
tems. While k dependence is avoided, lattice summation 
and formal integration over the unit cell volume, Vucj are 
retained. At first sight this would seem to make matrix 
construction quite different than in the gas phase, where 
integrals are typically taken over all space, Voo- Thus, 
elements of the gas phase overlap matrix, 

S ab = f dr (t> a {r)4> b {T) , (2) 

become 

S ab =Y, I dr&(r + R)&(r + R') (3) 

R,R' 

in the periodic T-point regime. However, this formalism 
can be brought into a form more closely related to its 
quantum chemical counterpart via the transformation, 

V / dr/(r + R)^ f dr/(r), (4) 

R JVue JVoc 

allowing use of conventional analytic integral technolo- 
gies. For example, elements of the periodic overlap ma- 
trix become 

5 afc = V/ dr0 a (r)0 b (r + R). (5) 

R J V~ 



For compactness of notation, let us define the inter- 
mediate basis function products (distributions) p &(r) = 
J^rr' <£a (r + R)0&(r + R') associated with integration 
over Vuc and the corresponding distributions /0J|(r) — 
Sr 0a( r )0o(r + R) associated with integration over 
Voo. We likewise define the electron density p(r) — 
Safe PabPabi?) associated with integration over Vuc and 
the corresponding density p°°(r) = J2 a b P ab Pab( r ) asso- 
ciated with integration over Voo, where P ao is the one- 
electron reduced density matrix. In this convention, Voo 
is the default volume of integration, and elements of the 
periodic overlap matrix are expressed simply as S ab = 
J drp~ b (r), while the electron count is N c \ = J drp°°(r). 

It is worth noting that the complexity of p°° is O(N), 
due to the exponential prefactor e _Xat, ( A ~ B ~ R ) that en- 
ters each term in the sum over A, B and R. Thus, N- 
scaling may be achieved a priori with a simple distance 
test. However, for small exponents, care must be ex- 
ercised in truncation of periodic sums to avoid overlap 
matrices that are not positive definite. While these sit- 
uations can often be ameliorated with a tighter distance 
neglect criteria, they are typically a symptom of near lin- 
ear dependence, often due to the use of basis sets designed 
for gas phase calculations in conjunction with small unit 
cells. 

These considerations and others are discussed by 
Towler in an excellent overview of Gaussian basis sets 
for the condensed phased. Also, there are at least two 
(albeit related) librariesS&Si of Gaussian basis sets appro- 
priate for materials at standard densities. For high densi- 
ties though, these basis sets may still encounter problems 
with linear dependence and sensitivity to truncation. 
One solution to this problem, suggested by Gruneich and 
Hesa^i for even tempered basis sets, is to scale the expo- 
nents by the inverse square of the lattice constant. In 
many cases though, especially for large systems, stan- 
dard quantum chemical basis sets work well. 



III. DATA STRUCTURES AND ERROR 
ESTIMATES 

Both the Quantum Chemical Tree-Code (QCTC)±£ for 
computing the Coulomb matrix and Hierarchical Cuba- 
ture (HiCu)A& for computing the Exchange-Correlation 
matrix are fast O(NlgN) algorithms whose performance 
is coupled to underlying data structures and error es- 
timates. It is important to understand some of these 
particulars first, before addressing their extension to pe- 
riodic boundary conditions. Also, the current version of 
QCTC is quite different from previous descriptions, and 
deserves some introduction. 

Both QCTC and HiCu are homeomorphic, involving 
k-d tree representation of the density. In our implemen- 
tations, k-d trees are doubly linked lists with axis aligned 
bounding boxes (AABBs) delimiting the spatial extent 
of each node and its children. This scheme is similar 
to well developed technologies for ray tracing and data 
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base searches, allowing fast OilgN) range queries of over- 
lapping components through AABB intersection tests^l. 
In the case of QCTC, this fast look up constitutes the 
penetration acceptability criterion (PAC) which identi- 
fies spatial clusters or agglomerations, pq, of the density 
that may be accurately represented via a multipole ap- 
proximation due to the absence of charge-charge pene- 
tration effects. 

For accepted clusters a second test, the multipole ac- 
ceptability criterion (MAC), is performed to check trans- 
lation errors in the multipole expansion. This second 
test is critical to the overall accuracy of the Coulomb 
matrix build. We have recently developed a new MAC 
in Ref. [s3 that has several advantages. First, it takes 
into account the magnitude or weight of the distribution 
within the cluster. Second, it correctly takes into account 
the angular symmetry of the primitive Gaussian distribu- 
tions. Third, and most important, it is always an exact 
bound to the translation error. 

For each primitive bra distribution p a b, a fast range 
query is performed on the fc-d tree representation of the 
total density, leading to an on the fly partition of near- 
field (NF) and far-field (FF) interactions in construction 
of the gas phase Coulomb matrix which may be written 

QeFF l rn 



V m' 



£ JdvJ dv'p ab {v)\v-v'\- l p q {v), 



(6) 
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where M l m is the irregular solid harmonic interaction ten- 
sor, O l m [f] — J drO l m (r)f(r) is a moment of the regular 
solid harmonic, Q runs over the highest possible nodes 
in the density-tree consistent with the PAC and MAC, 
and q runs on leftover near-field primitive distributions 
in the density. See Refs. p"7H33| for further details on this 
representation. 

A fundamental difference between QCTC and FMM 
based methods is that QCTC pushes the near /far- field 
partition to the limit, employing the PAC and MAC best 
case error estimates to resolve individual primitive distri- 
butions. On the other hand, FMM based methods em- 
ploy static, worst case error estimates. While recurring 
down the density tree to the level of individual primi- 
tives precludes well developed technologies for the inte- 
gral evaluation of contracted functions, it accelerates the 
onset of linear scaling through early clustering. 

The Quantum Chemical Tree Code generally employs 
the total density, which simplifies the code, allows elec- 
trostatic screening in MAC error estimates and provides 
charge neutrality, an essential feature for periodic calcu- 
lations. Thus, the Coulomb matrix employed here in- 
cludes the electron-nuclear terms; J = J cc + V en . 

In the case of HiCu, two separate fc-d tree structures 
are used. The rho-tree holds the electron density, while 
the cube-tree contains a hierarchical grid for integration 
of the exchange-correlation potential. Each node of the 
cube-tree is composed of a Cartesian non-product inte- 



gration rule with the grid points enclosed by it's AABB. 
The cube-tree is constructed iteratively through recur- 
sive bisection of the primary volume (the root AABB), 
using exact error bounds to achieve arbitrary precision 
of the integrated density and its gradients. As the cube- 
tree is extended, AABB intersection tests are performed 
while traversing the rho-tree, avoiding parts of the den- 
sity that do not overlap with that portion of the grid. 
Upon construction of the grid, the reverse procedure is 
carried out; for each primitive distribution, the cube-tree 
is walked selecting only overlapping portions of the grid 
via the AABB intersection test. 

For both of these fast algorithms, the trade off between 
efficiency and accuracy is controlled by the AABB, which 
in turn depends on the the extent or range R q of a primi- 
tive Gaussian distribution p qi beyond which it is assumed 
to be negligible. Of course, negligible depends on the use 
to which the distribution is put, as will become obvious 
in the following. 

Both HiCu and QCTC employ the Hermite Gaussian 
representation of distributions 3 ^ 



Pq 



(r) = Ydi mn K q lmn {Y), 



where 



A Ln( r ) = 



8Q l x dQ™dQ- 



,-CO-Q) 



(7) 



(8) 



This representation provides an intermediate form into 
which elements of the density matrix may be folded, 
and allows the use of McMurchie-Davidson recurrence 
relations 35 in analytic integral evaluation and density 
evaluation. For this form, Cramer's inequality 36 provides 
a bound for the behavior of a Hermite-Gaussian distri- 
bution: 



Pq 



(r)<C q e~^-V\ 



(9) 



where 



C q = \ d imn\ K 3 [2 l+m+n l\m\n\ ( l q +m+n ] 1/2 , (10) 



irnn 



the constant K = 1.09, and 



( q I + m + n = 
h( q otherwise. 



(11) 



The overlap extent R° is the value beyond which nu- 
merical evaluation of the distribution p q yields a value 
less than r; 



Cq e~W 2 = r. 



(12) 



For QCTC, errors in the electrostatic potential due to 
penetration errors must be considered. For this purpose, 
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FIG. 1: Behavior of the overlap extent R° and the penetration 
extent RF as a function of r/C q for an s-type Gaussian with 
exponent £ = 1. For small C q (occurring for example due 
to a large atom-atom separation and/or small density matrix 
prefactor), R° goes to zero at the origin and its distribution is 
eliminated, while R p goes slowly to zero due to the Coulomb 
singularity. 



the penetration extent is introduced, satisfying the 
equation 



C a 



/ ~\3/2 

(t/C,J S(r 



R *\ 



dr 



(13) 

In both HiCu and QCTC, the density-tree is con- 
structed by recursively splitting the largest box dimen- 
sion, until each primitive has been resolved. Then the 
primitive AABBs are computed from their extents and 
merged recursively back up the tree. For HiCu, this is all 
there is to it, but for QCTC multipole moments are also 
translated to a common center and recursively merged 
up the tree. Also, when computing matrix elements of J, 
the primitive bra AABB is computed with R°, while the 
-RP are used to construct AABBs of the density-tree. 

In Fig. 1, differences between the penetration and over- 
lap extent are shown for a diffuse s-type Gaussian. For 
large extents, such as those encountered in a static FMM- 
type error bound, the two extents behave in a similar way. 
However, with aggressive use of the multipole approxima- 
tion as in QCTC, the distinction becomes critical. 



IV. PERIODIC QUANTUM COULOMB SUMS 

In the T-point approximation, elements of the periodic 
Coulomb matrix are 

Jab = [ dr [ dr' Pab (r) |r - r'f Vtot (r') (14) 

JVvc J 

= E// *drVr 6 (r)|r-r'rVrot(r' + R) 



where ptot is the total, periodic density including both 
electronic and nuclear terms. These integrals involve in- 
finite summation over the lattice vectors R, and must 
be handled with care. There are at least two main ap- 
proaches to handling this summation: Multipole expan- 
sion of the Ewald potential or Ewald-like summation of 
the multipole expansion. Expansion of the Ewald poten- 
tial yields tin foil (TF) boundary conditions, requires re- 
ciprocal and real space summation with every J build, 
and scales as 0(A 3 / 2 ). An alternative is the Ewald- 
like summation of the multipole interaction tensor, which 
was first described by Nijboer and De Wette (NDW) 24 i 25 
and later reviewed and extended by Challacombe, White 
and Head-Gordon^ 3 - to lattice summation of the irregular 
solid harmonic multipole interaction tensor. This Ewald- 
like summation is taken over the periodic far field, Vpff, 
and is equivalent to a direct lattice summation (not a true 
Ewald sum) excluding an inner region, V\ n , surrounding 
the unit cell. This inner region has been subtracted to 
avoid penetration errors and to guarantee convergence 
of the multipole expansion. With the summed interac- 
tion tensors cheaply precomputed and reused, the cost of 
Coulomb summation over the PFF scales as as 0(Np 2 ), 
where p is the order of the multipole expansion. With 
this partition, the A^-scaling periodic quantum Coulomb 
sums involve the contributions 



J = J In 



• PFF 



rTF 



(15) 



corresponding to the three separate regions shown in 
Fig. 2. Here, J In is computed using the fast O(NlgN) 
QCTC algorithm outlined previously in Sect ion 1TTT1 Con- 
struction of J PFF will be developed in the following sec- 
tion, while in Section II V Bl the term J TF , necessary to 
introduce tin-foil boundary conditions, is detailed. 



A. The Periodic Far Field 

By construction, the periodic far field (PFF) term in 
the Coulomb matrix, 



7 PFF 

<J nh 



£ f /drdr'pS(r)|r-r'+Rr\»(r'), 



RePFF 



(16) 

involves charge distributions that are well separated with 
respect to both penetration and the convergence of multi- 
pole expansion errors, as outlined in Fig. 2 and discussed 
in the following. 

With these conditions, and assuming the unit cell is 
centered at the origin, the bipolar multipole expansion 
employing the regular and irregular solid harmonics, O l m 
and M l m respectively is 



(17) 



1=0 

i 



l'=0 



£ £ Or(r)M«'(R)0™(r') 



m— — l m— — V 
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effective multipole interaction tensor 




m?= y, M r(R), as) 

R.£ VpFF 

which can be efficiently computed on the fly for each new 
lattice, both to high accuracy and to high order (large p) 
using the new methods detailed in Appendix ^ Note 
that this is a direct sum of the interaction tensor, and is 
not equivalent to Ewald summation. Nevertheless, with 
this simplification, the 0(p 2 N) working equation 

Ja b FF = ib E °T\Pa b \jr, (20) 

(=0 m=— I 

is obtained, where the intermediate tensor 

JL = (-1)' E E M ? + T' ?' [p? ot ] (21) 

l'=Q m=-V 



□ Unit Cell 



V. 



FIG. 2: Schematic of the regions contributing to JV-scaling 
summation of the Coulomb matrix. The inner cells that make 
up Vin provide a buffer region that guarantees convergence of 
the multipole expansion of Coulomb interactions between the 
unit cell and all cells in Vpff. The periodic far-field region, 
Vpff, is the spherically ordered lattice extending to infinity 
but excluding Vi n . For large cells and/or high order multipole 
expansions, Vi n includes just the unit cell's 27 nearest neigh- 
bors. In FMM notation this corresponds to ws=l. However, 
for smaller cells and/or lower order multipole expansions, Vi n 
tends to a spherical distribution of cells surrounding the unit 
cell. Direct summation over Vpff leads to charges at the 
infinite surface Soo, which must be be canceled by tin-foil 
(conducting) boundary conditions to achieve equivalence with 
Ewald summation. 



Inserting Eq. (|17|l into into Eq. I|lb|) yields 

p i 



J p J F = E EM)' E E ( 



(18) 



RGPFF 1=0 

I' 



l'=0 m=-l 



Y or{p%,}M™tr'(R)of 



Ptot. 



is cheaply precomputed at the start of each Coulomb 
build. 

Because Eq. (|20ll is inexpensive, our strategy is to de- 
fine a minimal buffer region, Vj n , sufficient to control pen- 
etration errors, subtracting effort from the computation 
of J In via QCTC and replacing it with cheaper, multipole 
work in the computation of J PFF . To this end, a fixed in- 
ner region Vi a is constructed from neighboring cells that 
have simple Gaussian overlap with the unit cell, defined 
by the radius R° . As explained in Section II I II for the 
relatively large distances considered at this level the dif- 
ferences between the penetration and overlap extent are 
negligible. With Vi„ fixed, the precision of J PFF is con- 
trolled entirely by the expansion order p. In general p will 
be much higher than the expansion order (~ 5) employed 
by QCTC in computation of J In . With QCTC accuracy 
is controlled on the fly by the MAC and PAC, establish- 
ing a dynamic near/far- field partition, while computation 
of J PFF involves a static, worst-case error dominated by 
the multipole expansion. This static error is controlled 
by using the FMM-like error bound, 



{R°)p+ 1 \R° -2d n 



< T., 



(22) 



to set the appropriate expansion order p. In Eq. 122|l . 
rfmax is the maximum translational distance, C is the 
asymptotic Unsold weight of the total density and t mac 
is the threshold controlling the translation errors. See 
Ref. |32j for development of this expression and further 
explanation of these parameters. 



This expression decouples the complexity of p^ h from p^ t 
through the precomputed multipole moments O" 1 [p^] = 
/ drOp (r) p% (r) and Of [p^\ = J drOf (r) p& (r). 
Following Nijboer and De WetteS^SiS, we introduce the 



B. Tin- Foil Boundary Conditions 

The surface charges created by direct summation over 
Vpff must be canceled to achieve equivalence with Ewald 
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FIG. 3: Transformation between the unit cell with volume 
Vuc (described by the primitive lattice vectors a, b and c) 
and the rectangular integration volume Vn employed by HiCu. 



summation. Achieving this equivalence is more than se- 
mantic, since without tin-foil boundary conditions ma- 
trix elements lack translational invariance and often in- 
cur dramatic charge sloshing instabilities. The correction 
is strongly dependent on ordering of the direct sum; as 
the Nijboer and De Wette method corresponds to spher- 
ical summation due to symmetry of the real/reciprocal 
space partition, the appropriate correction is^l 



$Ew (r) = $ss (r) + 



2tt 
3Vuc" 



(Q-2r-D) 



(23) 



where D is the system dipole moment, Q is trace of the 
system quadrupole and we have assumed origin centering, 
The tin-foil correction to the Coulomb matrix is then 



J. 



TF 
ab 



2tt 
3Vuc~ 



{QS ab -2d ab -T>) 



(24) 



with S a b an element of the overlap matrix, and d a b the 
dipole moment of the distribution p^|. 



V. PERIODIC EXCHANGE-CORRELATION 

The HiCu algorithm is ideally suited for periodic 
boundary conditions, as the unit-cell Vuc can be simply 
transformed into an equivalent rectangular integration 
domain Vu that is the cube-tree's AABB. These volumes, 
shown in Fig. 3, are equivalent due to full periodicity of 
both distributions and density. The integration is then 
simply 



dr pab(r)v^ [p;r] 



(25) 



TABLE I: Comparison of CRYSTAL98 and MondoSCF V- 
point calculations on NaCl at the RBLYP/8-511/8-631G level 
of theory. 



Program 


iVat 


Energy (au) 


Energy/Aat 


MondoSCF 


2f 


-622.39101 


-311.19551 


CRYSTAL98 


2 s 


-622.39114 


-311.19557 


MondoSCF 


8 s 


-2490.0016 


-311.25020 


CRYSTAL98 


8 9 


-2490.0013 


-311.25016 



9 Cubic 



TABLE II: Convergence of the T-point super-cell approx- 
imation for NaCl, computed with MondoSCF at the 
RLDA/STO-3G level of theory. 



Program 


iVat 


Energy (au) 


Energy/A^ 


MondoSCF 


2' 


-610.97536 


-305.48768 




8 9 


-2444.3584 


-305.54480 




16 / 


-4888.7002 


-305.54377 




54 f 


-16499.490 


-305.54611 




64 9 


-19554.956 


-305.54618 




128-^ 


-39109.912 


-305.54618 




216 9 


-65997.977 


-305.54619 


CRYSTAL98 ft 


2> 


-611.09228 


-305.54614 



■'Triclinic 
9 Cubic 

''6x6x6 k-space grid. 



This approach should be contrasted with more con- 
ventional quantum chemical methods for computing 
the exchange-correlation matrix, involving the "Becke 
weights"—, which demand numerical integration over 

Voo. 

While we have written Eq.|2Hlm terms of the exchange- 
correlation potential for simplicity, in practice HiCu em- 
ploys the Pople, Gill and Johnson formulation 18 39 . 

Because the distributions and density both involve a 
double sum over lattice vectors, there will be a large num- 
ber of atom-atom pairs that do not overlap with Vn • A 
similar situation is encountered in the gas phase for par- 
allel versions of HiCu 40 , where each processor has a small, 
local cube-tree that may overlap only a few of the many 
possible atom-atom pairs. The solution to this problem 
again comes from the ray tracing literature, in the form 
of a modified ray-AABB 3 ! and sphere- AABB test 41 . The 
ray- AABB test has been modified into a cylinder- AABB 
test, where the radius of the cylinder is a maximal over- 
lap extent of the atom-atom pair. In the case of a same 
center atom-atom pair, it is of course more appropriate to 
employ a spherc-AABB test. In both cases, overlap be- 
tween the HiCu integration volume and atom-atom pairs 
is established with a negligible prefactor when using these 
tests. 
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TABLE III: Convergence of the F-point super-cell approxima- 
tion for MgO, computed with MondoSCF at the RBLYP/8- 
61G/8-51G level of theory. 



Program 


AT at 


Energy (au) 


Energy/iVat 


MondoSCF 


2> 


-275.09097 


-137.54548 




8" 


-1101.7295 


-137.71618 




16 J 


-2203.6904 


-137.73065 




54 / 


-7437.7989 


-137.73702 




64 9 


-8815.2131 


-137.73771 




128 / 


-17630.430 


-137.73774 




216 9 


-29751.352 


-137.73774 


CRYSTAL98 h 


2> 


-275.47547 


-137.73774 



Triclinic 



9 Cubic 

h 6 x 6 x 6 k-space grid. 



TABLE IV: Convergence of the F-point super-cell approx- 
imation for diamond, computed with MondoSCF at the 
RBLYP/6-21G* level of theory. 
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FIG. 4: Error in the Coulomb energy computed with the Ni- 
jboer and De Wette scheme relative to true Ewald summation. 
Shown is the error in the Coulomb energy versus p with one 
(ws=l) and two (ws=2) layers of cells in Vpff for a periodic 
system of 64 classical water molecules. 
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-303.989 


-37.9986 


16 


-608.667 


-38.0417 


32 


-1218.02 


-38.0632 


64 


-2436.28 


-38.0669 


96 


-3654.59 


-38.0687 


144 


-5482.04 


-38.0697 


216 


-8223.10 


-38.0699 


288 


-10964.1 


-38.0700 


384 


-14618.9 


-38.0700 



VI. RESULTS 

A. Implementation 

All developments were implemented in a serial ver- 
sion of MondoSCF vl.0a92&, a suite of linear scaling 
Quantum Chemistry code. The code was compiled us- 
ing the Portland Group F90 compiler pgf90 v4.2^ with 
the -01 -tp athlon options and with the Gnu C com- 
piler GCC v3.2.2 using the -01 flag. All calculations were 
carried out on a 1.6GHz AMD Athlon running RedHat 
Linuxv9.0^. 

Thresholds controlling the cost to accuracy ratio of 
HiCu and QCTC are set by the accuracy levels LOOSE, 
GOOD and TIGHT, which have been empirically chosen to 
deliver 4-5, 6-7 and 8-9 digits, respectively, of relative ac- 
curacy in the energy. Values of these thresholds are listed 
in Appendix B of Ref . [2lJ . The unmodified two-electron 
threshold r 2B sets the overlap extent R° in Eq. (|12l) and 
the penetration extent R? in Eq. (|13ll . both of which 
control the PAC. As explained in Ref. |32|. the threshold 
t mac controlling the MAC is set as t mac = 10 2 r 2E . The 
HiCu threshold r H i C u likewise sets two sub-thresholds. 
The overlap extent R° in Eq. (|12|) . defining accuracy 



of the density on the grid, is set using 10 -1 t H i C u (t p in 
Ref. 01)- The target relative error defining accuracy of 
the HiCu grid is just T mC u [r r in Ref. 0). It should be 
pointed out that of all the thresholding schemes, those 
governing HiCu are the least conservative; it is a sim- 
ple (and not too expensive) matter to simply tighten the 
HiCu threshold if intermediate accuracies are required. 

The multipole interaction and contraction code used by 
QCTC in the near /far-field partition has been highly op- 
timized by symbolic manipulation and factorization, us- 
ing real arithmetic and expansions through 7'th order in 
the calculation of J In . The computation of J PFF employs 
a general code for multipole contraction, allowing expan- 
sion through p = 64. Eigensolution of the self-consistent- 
field equations has been used throughout, with the cor- 
responding matrix and distribution thresholds given Ap- 
pendix B of Ref. [23. All calculations were performed 
with C\ (no) symmetry, and all results are reported in 
atomic units. 



B. Validation 

The ability of our implementation to reproduce true 
Ewald summation is shown in Fig. 4 for a periodic sys- 
tem of 64 classical water molecules. Note that with both 
the Ewald sum and the Nijboer and De Wette approach, 
ordering the real and reciprocal space sums is critical; 
high order agreement is achieved only when the summa- 
tion proceeds from the smallest to the largest terms. 

The use of Cartesian Gaussian basis sets in many cases 
allows direct numerical comparison of different programs, 
at least to within the approximations, grids, etc pecu- 
liar to a code. Here, we make connection with the pre- 
eminent Gaussian orbital program for periodic calcula- 
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FIG. 5: Computational complexity of the J and K xc matrix 
builds for cubic diamond at the RBLYP/6-21G* level of the- 
ory at the GOOD accuracy level. 
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FIG. 6: Relative error with system size for RBLYP/6-21G* 
diamond at the GOOD accuracy level. 



tions, CRYSTAL98i£. Calculations have been carried 
out largely with basis sets optimized for the condensed 
phased, which tend to have less diffuse valence func- 
tions. Tables lTllIII make a direct comparison with CRYS- 
TALS for NaCl and MgO obtained with the Mon- 
DOSCF TIGHT precision level. For the CRYSTAL98 
calculations, we used the following threshold parameters: 
T0LDENS=10, T0LP0T=10, TDLGRID=15 and BASIS=4. The 
BASIS parameter determines the auxiliary functions used 
to fit the exchange-correlation potential 

In Table [I] comparison is made for T-point NaCl 
with the 8-51 basis set for sodium, the 8-631Gi£ 
basis set for chlorine and using the restricted BLYP 
functionalist. Next, in Table ITT1 convergence of the su- 
percell T-point approximation is demonstrated for NaCl 
with the STO-3G basis set and the restricted local den- 
sity approximation. Then, in Table IIIII convergence of 
the supercell T-point approximation to the k-space in- 
tegration result is demonstrated for MgO, using the 8- 
basis set for magnesium, the 8-51G™ basis set for 
the oxygen, and the restricted BLYP functional. The 
primitive lattice coordinates for these systems are given 
in Ref. 

Finally, in Table [[V] convergence of the supercell T- 
point approximation is shown for diamond at the GOOD 
accuracy level, using the restricted BLYP functional and 
the 6-21G*^a basis set. Since MondoSCF employs 6- 
d and 10-/ functions, while CRYSTAL98 employs b-d 
and 7-f, we were not able to make a direct comparison 
for this basis set. 



C. Scaling and accuracy 

Demonstrating linear scaling at the outset, Fig. 5 
shows the CPU time for J and K xc builds with 



RBLYP/6-21G* diamond at its standard density. These 
timings correspond to a GOOD level of accuracy, targeting 
6 digits in the total energy and corresponding to the val- 
ues listed in Tabl dlVl Shown in Fig. 6 is the precision 
of the computed energies, obtained by performing a sec- 
ond set of calculations with all thresholds reduced by one 
order of magnitude. For these calculations, the largest 
source of error is the numerical integration performed 
by HiCu, as the QCTC thresholds are significantly more 
conservative. 



VII. CONCLUSIONS 

We have extended linear scaling quantum chemical 
methods for computation of exchange-correlation and 
Coulomb matrices to periodic boundary conditions at the 
r-point. These methods have demonstrated an early on- 
set of linear scaling and error control for diamond, allow- 
ing calculations up to 512 atoms at the RBLYP/6-21G* 
level of theory. In both cases, this early onset of lin- 
ear scaling has been enabled by the use of modern data 
structures such as the k-d tree, together with reliable er- 
ror estimates for the Gaussian extent. These algorithms 
have been parallelized^^, demonstrating high efficien- 
cies up to 128 processors, and have been used recently to 
determine the T — OK equation of state for pentaerythri- 
tol tetranitrate^ at the RPBE/6-31G** level of theory 
and a GOOD accuracy level. 

While this contribution has focused on demonstrating 
linear scaling for diamond, the methods presented here 
work for slabs and wires as well, using methods for com- 
putation of the two- and one-dimensional the M. tensor 
outlined in Appendix 1X1 Our experience has shown that, 
for the same number of atoms, these lower dimensional 
systems run much faster. 
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APPENDIX A: COMPUTATION OF THE M 
TENSOR 



Following Nijboer and De Wett o 24 i 2 ^ , and later Chal- 
lacombe, White and Head-Gordon 33 (CWHG), we begin 
with the partition 



1 



r i+ 

involving the functions 

Qi{fi,r) 

and 



r(Z+l,/?V) 

r(/ + |) H+i 

2/ 



(Al) 



(A2) 



(A3) 



where Y is the gamma function, 7 is the incomplete 
gamma function^ and is a parameter controlling the 
partition. With this separation of length scales, the lat- 
tice sum defining the multipole interaction tensor, M.7 1 , 
may be expressed as 



Mr = Yl M n R ] 

Ft£ VpFF 

= Yl p r (cos e R ) 



Qi 09, |R|) 



R.G VpFF 

J2 Pr(cose R )e im ^^(B,\IL\) 

R.€: VpFF 



(A4) 



Following CWHG, this expression can be further devel- 
oped into real and reciprocal space terms: 

M T = J2 ^ m (cosMe im0R ^(/3,|R|) 

R-IEVpFF 

]T Pr(cosMe mi ^(A|R|) 



V VC T ll 



2) 
,1-2 



J2 \Gf- 2 e-^^P l (cos0 G )e l 

G#{0} 



(A5) 



where G are reciprocal lattice vectors. With an appro- 
priate choice of /3 ~ y/n/ (Vuc) 3 , and summing terms 
from smallest to largest, the periodic multipole interac- 
tion tensor can be computed to high precision assum- 
ing an accurate representation of the incomplete gamma 
function. In previous work by CWHG, the upward re- 
cursion 



r (m + 1, x) = mY (to, x) + x m e 



(A6) 
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was used, which results in a loss ol precision for large val- 
ues of x and to, demanding extended precision arithmetic 
and precluding on the fly computation. This problem is 
overcome by analytically summing the gamma function, 
collecting terms, and then rewriting it as 



r j to+ ~, xj = T (m + i 



i — m— 1 

erfc +J-Y1 ( S 

„ ' 77 n=0 



x e 



(A7) 



where the terms 



r(n + |) 



(A8) 



are simply pretabulated. This version of the gamma func- 
tion is both easy to program and precise, even for large 
values of x or to. 

In one dimension, the M. tensor can be computed an- 
alytically as 



1 



i7"(co S fl )e im *° 



P™ (cos (6» + f)) e* ro ^° +7r ) y, 1 



J+i 



n— no 
n - 1 



Qp [ao.flo.^o] KG + 1) - £ "TTT 



n=l 



(A9) 



where ao, #o and <^o are the initial box dimension and 
angles which are independent of the summation, and £ is 
the Riemann zeta function 53 . 

In two dimensions, the Fourier integrals for the calcu- 
lation of the M. tensor become more complicated. Taking 
the limit as the box dimension in the non-periodic direc- 
tion goes to infinity (the z direction in the following), we 
obtain from Eq. IA5I 



M] 



£ (i-m)!ir(coBfe)e^eiC9,|R|; 

R.<E VpFF 

£ {l - m)\ (cos6 R ) e*"**-^ 09, |R|) 



/oo 
dG z |G 
. ... -oo 



7T 2 |G| 2 

* _ 32 — 



■ e p 2 



P z m (cos6» G )e l 



(A10) 



where 



A vc r (i + 1) 



(All) 



and Aye is the area of the cell along the non-periodic 
direction. In practice, we carry out numerical evaluation 
of this integral. 



